The role of input noise in transcriptional regulation 
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Even under constant external conditions, the expression levels of genes fluctuate. Much emphasis 
has been placed on the components of this noise that are due to randomness in transcription and 
translation; here we analyze the role of noise associated with the inputs to transcriptional regulation, 
the random arrival and binding of transcription factors to their target sites along the genome. This 
noise sets a fundamental physical limit to the reliability of genetic control, and has clear signatures, 
but we show that these are easily obscured by experimental limitations and even by conventional 
methods for plotting the variance vs. mean expression level. We argue that simple, global models 
of noise dominated by transcription and translation are inconsistent with the embedding of gene 
expression in a network of regulatory interactions. Analysis of recent experiments on transcriptional 
control in the early Drosophila embryo shows that these results are quantitatively consistent with the 
predicted signatures of input noise, and we discuss the experiments needed to test the importance 
of input noise more generally. 
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I. INTRODUCTION 

A number of recent experiments have focused attention 
on noise in gene expression 0, @, i, 0, i, @, HSU- The 
study of noise in biological systems more generally has a 
long history, with two very different streams of thought. 
On the one hand, observations of noise in behavior at 
the cellular or even organismal level give us a window 
into mechanisms at a much more microscopic level. The 
classic example of using noise to draw inferences about 
biological mechanism is perhaps the Luria-Delbriick ex- 
periment [1 01 ] , which demonstrated the random character 
of mutations, but one can also point to early work on the 
nature of chemical transmission at synapses 
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on the dynamics of ion channel proteins 
On the other hand, noise limits the reliability of biologi- 
cal function, and it is important to identify these limits. 
Examples include tracking the reliability of visual per- 
ception at low light levels down to the ability of the vi- 
sual system to count single photons [13, EH , the implica- 
tions of channel noise for the reliability of neural coding 
(l9l I20I l2l| . and the approach of bacterial chemotactic 
performance to the limits set by the random arrival of 
individual molecules at the cell surface [22| . 

After demonstrating that one can observe noise in gene 
expression, most investigators have concentrated on the 
mechanistic implications of this noise. Working back- 
ward from the observation of protein concentrations, one 
can try to find the components of noise that derive from 
the translation of messenger RNA into protein, or the 
components that arise from noise in the transcription 
and degradation of the mRNA itself. At least in some or- 
ganisms, a single mRNA transcript can give rise to many 
protein molecules, and this 'burst' both amplifies the fluc- 
tuations in mRNA copy number and changes their statis- 
tics, so that even if the number of mRNA copies obeys 
the Poisson distribution the number of protein molecules 
will not [2^| . This discussion parallels the understanding 



that Poisson arrival of photons at the retina generates 
non-Poisson statistics of action potentials in retinal gan- 
glion cells because each photon triggers a burst of spikes 
|24| . Recent large scale surveys of noise in eukaryotic 
transcription have suggested that the noise in most pro- 
tein levels can be understood in terms of this picture, so 
that the fractional variance in the number of proteins pi 
expressed from gene i is given by 
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where b ~ 10 3 is the burst size, and is approximately 
constant for all genes |9j. 

The mechanistic focus on noise in transcription vs 
translation perhaps misses the functional role of gene ex- 
pression as part of a regulatory network. Almost all genes 
are subject to transcriptional regulation, and hence the 
expression level of a particular protein can be viewed as 
the cell's response to the concentration of the relevant 
transcription factors. Seen in this way, transcription and 
translation are at the 'output' side of the response, and 
the binding of transcription factors to their targets along 
the genome is at the 'input' side (FigUJ. Noise can arise 
at both the input and output, and while fluctuations in 
transcription factor concentration could be viewed as an 
extrinsic source of noise [2, , there will be fluctuations 
in target site occupancy even at fixed transcription fac- 
tor concentration [26l \2l\ , |2c| . There is a physical limit 
to how much the impact of these input fluctuations can 
be reduced, essentially because any physical device that 
responds to changes in concentration is limited by shot 
noise in the diffusive arrival of the relevant molecules at 
their target sites dHHHl]. 

In this paper we revisit the relative contributions of in- 
put and output noise. Input noise has a clear signature, 
namely that its impact on the output protein concentra- 
tion peaks at an intermediate value of the input tran- 
scription factor concentration. The analogous signature 
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was essential, for example, in identifying the noise from 
random opening and closing of individual ion channels 
in neurons [13, EH- Perhaps surprisingly, we show that 
this signature is easily obscured in conventional ways of 
plotting the data on noise in gene expression. Recent ex- 
periments on the regulation of Hunchback expression by 
Bicoid in the early Drosophila embryo [32l . |33| are con- 
sistent with the predicted signature of input noise, and 
(although there are caveats) a quantitative analysis of 
these data supports a dominant contribution of diffusive 
shot noise. We discuss what experiments would be re- 
quired to test this conclusion more generally. We begin, 
however, by asking whether any simple global model such 
as Eq H]) can be consistent with the imbedding of gene 
expression in a network of regulatory interactions. 



II. GLOBAL CONSISTENCY? 

Consider a gene i which is regulated by several tran- 
scription factors. In steady state, the mean number of 
these proteins in the cell will be a function of the copy 
numbers of all the relevant transcription factors: 



relation g (y, |34j , so that 

((^i) 2 ) = E E TTJT^ 5 ^ + (OWU (3) 
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where we include the intrinsic noise ((<5pi) 2 }int that occurs 
at fixed transcription factor levels. 

If the noise in gene expression is dominated by the pro- 
cesses of transcription and translation, and if the tran- 
scription factors are not regulating each other, then the 
correlations between fluctuations in the copy numbers of 
different proteins will be very small, so we expect that 
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This allows us to simplify the propagation of noise in Eq 
([3]) to give 

<(^i) 2 > - E (Ir-V «%) 2 > + <(M) 2 )int. (5) 
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If, as in Eq l|T|). we express the noise in protein copy 
number as a fractional noise 77, then this becomes 



(Pi) = 9i{Pi,P2, ■ ■ ■ ,Pk) 



(2) 



If the copy numbers of the transcription factors fluctu- 
ate, this noise will propagate through the input/output 
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FIG. 1: A simple model for transcriptional regulation. Tran- 
scription factor is present at an average concentration c, dif- 
fusing freely with diffusion constant D; it can bind to the 
binding site of linear dimension a and the fractional occu- 
pancy of this site is n £ [0, 1]. Binding occurs with a second 
order rate constant k+, and unbinding occurs with a first or- 
der rate constant k- . When the site is bound, the mRNA are 
transcribed at rate R e and degraded with rate t" 1 , resulting 
in a number of transcripts e. Proteins are translated from 
each mRNA molecule with rate R p and degraded with rate 
Tp 1 , resulting in a copy number p. 



p — 1 



(6) 



In particular, this means that there is a minimum level 
of noise, 
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But if the fractional variance in protein copy number has 
a simple, global relation to the mean copy number, as in 
Eq ([T]) [9(, then this simplifies still further: 
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Since the proteins labeled by the indices ijl represent 
transcription factors, usually present at low concentra- 
tions, and the protein i is a regulated gene — such as a 
structural or metabolic protein — but not a transcription 
factor itself, one expects that (pi)/(p^) 3> 1. But then 
we have 
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(10) 



Since this inequality constrains the sum of squares of 
terms, each must be much smaller than one. This means 
that when we make a small change the concentration of 
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any transcription factor, the response of the regulated 
gene must be much less than proportional. In this sense, 
the assumption of a simple global description for the level 
of noise in gene expression, Eq (fTJ) , leads us to the con- 
clusion that transcriptional "regulation" can't really be 
very effective, and this must be wrong. Notice that this 
problem is independent of the burst size 6, and hence 
doesn't depend on whether the noise is dominated by 
transcription or translation. 

Our conclusion from the inequality in Eq (|10[) is 
that we should re-examine the original hypothesis about 
noise, Eq ([1]). An alternative is that this hypothesis is 
correct, but that there are subtle correlations among all 
the protein copy number fluctuations of all the different 
transcription factors. If we want the global output model 
to be correct, these correlations would have to take on 
a very special form — different transcription factors reg- 
ulating a single gene would have to be correlated in a 
way that matches their impact on the expression of that 
gene — which seems implausible but would be very inter- 
esting if it were true. 



III. SOURCES OF NOISE 

Figure [1] makes clear that the concentration of a pro- 
tein can fluctuate for many reasons. The processes of 
synthesis and degradation of the protein molecules them- 
selves are discrete and stochastic, as are the synthesis and 
degradation of mRNA molecules; together these consti- 
tute the "output noise" which has been widely discussed. 
But if we are considering a gene whose transcription is 
regulated, we need a microscopic model for this pro- 
cess. For the case of a transcriptional activator, there 
are binding sites for the transcription factors upstream 
of the regulated gene, and when these sites are occupied 
transcription proceeds at some rate, but when the site is 
empty transcription is inhibited. Because there are only 
a small number of relevant binding sites (in the simplest 
case, just one), the occupancy of these sites must fluctu- 
ate, and this random switching is an additional source of 
noise. In addition, the binding of transcription factors to 
their target sites along the genome depends on the con- 
centration in the immediate neighborhood of these sites, 
and this fluctuates as molecules diffuse into and out of 
the neighborhood. 

All of the different processes described above and 
schematized in Fig [T] can be analyzed analytically us- 
ing Langevin methods, and the predictions of this analy- 
sis can be tested against detailed stochastic simulations. 
The details of the analysis are given in Appendix A. No- 
tice that variations in cell size, protein sorting in cell 
division, fluctuations in RNA polymerase and ribosome 
concentrations, and all other extrinsic contributions to 
the noise are neglected. 

When the dust settles, the variance in protein copy 
number can be written as a sum of three terms, which 
correspond to the output, switching, and diffusion noise. 



To set the scale, we express the copy number as a fraction 
of its maximum possible mean value, po, which is reached 
at high concentrations of the transcriptional activator. In 
these units, we find 

\Po J Po k-Tp nDacTp 

where p = (p) /po is the protein copy number expressed 
as a fraction of its maximal value, c is the concentration 
of the transcription factor, and other parameters are as 
explained in Fig |TJ 

The first term in Eq (fTTjl is the output noise and has 
a Poisson-like behavior, with variance proportional to 
the mean, but the proportionality constant differs from 
1 by R p T e , i.e. the burst size or the number of proteins 
produced per mRNA [23[. This is just the simple model 
of Eq (P), with 6=1 + R p r e . 

The second term in Eq (jTTJ) originates from binomial 
"switching" as the transcription factor binding site oc- 
cupation fluctuates, and is most closely analogous to the 
noise from random opening and closing of ion channels. 
This term will be small for unbinding rates fc_ that are 
fast compared to the protein lifetime, but might be large 
for factors that take a long time to equilibrate or that 
form energetically stable complexes on their promoters. 

The third term in Eq (jTTJ) arises because the diffu- 
sive flux of transcription factor molecules to the binding 
site fluctuates at low input concentration c; in effect the 
receptor site "counts" the number of molecules arriving 
into its vicinity during a time window r p , and this number 
is of the order ~ Dacr p . This argument is conceptually 
the same as that for the limits to chemoatractant detec- 
tion in chemotaxis, as discussed by Berg and Purcell [2^ |. 
It can be shown that this is a theoretical noise floor that 
cannot be circumvented by using sophisticated "binding 
site machinery" as long as this machinery is contained 
within a region of linear size a [2rl l29 |. For example, 
cooperative binding to the promoter or promoters with 
multiple internal states will modify the binomial switch- 
ing term, but will leave the diffusion noise unaffected if 
we express it as an effective noise in transcription factor 
concentration a c such that 



Although cooperativity does not change the effective 
concentration noise due to diffusion, it does reduce the 
relative significance of the switching noise (29| . Since 
we will discuss a system which is strongly cooperative, in 
much of what follows we neglect the switching noise term 
and focus on the output noise and diffusion noise. Then 
the generalization to multisite, cooperative regulation is 
straightforward (see Appendix B). We expect that coop- 
erative effects among h transcription factors generate a 
sigmoidal dependence of expression on the transcription 
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FIG. 2: Expression noise as a function of the mean. The stan- 
dard deviation of the protein concentration a P /po is plotted 
against the mean protein concentration p — {p)/po, from Eq 
(|14[) with h — 5. In all cases the output noise term has a 
strength a = 0.01, and the different curves are indexed by 
the ratio of input noise to output noise (3/a = 0,10,20,30. 
In the absence of input noise, the noise level is a monotonic 
function of the mean, but input noise contributes a peak near 
the point of half maximal expression p = 0.5. In the inset, we 
show the same results plotted as a fractional noise variance 
j]p vs the mean [Eq (|15[) ]. on a logarithmic scale, and we see 
that the prominent peak has become just an inflection. For 
most of the dynamic range of means, the contribution of input 
noise is to increase the fractional variance without substantial 
changes in the slope of the double-log plot, so that we can 
confuse input noise with a larger level of output noise, espe- 
cially if we remember that real data will be scattered due to 
measurement errors. 



factor concentration, so that 



P 



(13) 



where h is called the Hill coefficient, and is the 
concentration required for half maximal activation. We 
can invert this relationship to write the concentration c, 
which is relevant for the diffusive noise, as a function 
of the mean fractional expression level p. Substituting 
back into Eq (|11[) . and neglecting the switching noise, we 
obtain 



= ap + /3p 2 - 1/h (l -p) 
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where a and (3 are combinations of parameters that mea- 
sure the strength of the output and diffusion noise, re- 
spectively. If we express the variance in fractional terms, 
this becomes 
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The global output noise model of Eq ([1} corresponds to 
(3 = (no input noise) and b = ap n . Figure [5] shows 
the predicted noise levels for different ratios of output to 
input noise {13/ a). 

For very highly cooperative, essentially switch-like sys- 
tems, we can take the limit h — > oo to obtain 



^) = a p + pp 2 (l-p) 2 
Po. 
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In particular, if we explore only expression levels well 
below the maximum (p <C 1), then the diffusion noise 
just add a constant (3 to the fractional variance. Thus, 
diffusion noise in a highly cooperative system could be 
confused with a global or even extrinsic noise source. 



IV. SIGNATURES OF INPUT NOISE 

Input noise arises from fluctuations in the occupancy 
of the transcription factor binding sites. Thus, if we go 
to very high transcription factor concentrations, where 
all sites are fully occupied, or to very low concentrations, 
where the sites are never occupied, the fluctuations must 
vanish. These limits correspond, in the case of a tran- 
scriptional activator, to maximal and minimal expression 
levels, respectively. Thus, the key signature of input noise 
is that it must be largest at some intermediate expression 
level, as shown in Fig [21 

The claim that many genes have expression noise levels 
which fit the global output noise model of Eq |T|) would 
seem to contradict the prediction of a peak in the noise 
as a function of the mean. But if we plot the predictions 
of the model with input noise as a fractional variance vs 
mean, the prominent peak disappears (inset to Fig [5]). In 
fact, over a large dynamic range, the input noise seems 
just to increase the magnitude of the fractional variance 
while not making a substantial change in the slope of 
log(?7p) vs log((p)). Confronted with real data on a sys- 
tem with significant input noise, we could thus fit much 
of those data with the global output noise model but 
with a larger value of b. There is, of course, a difference 
between input and output noise, even when plotted as 
log(?7p) vs log((p)), namely a rapid drop in noise level 
as we approach maximal expression. But this effect is 
confined to a narrow range, essentially a factor of two in 
mean expression level. As we discuss below, there are 
variety of reasons why this might not have been seen in 
the data of Ref . 

Recent experiments on the precision of gene expres- 
sion in the early Drosophila embryo provide us with an 
opportunity to search for the signatures of input noise 
[32I [H| . The embryo contains a spatial gradient of the 
protein Bicoid (Bed), translated from maternal mRNA, 
and this protein is a transcription factor which acti- 
vates, among other genes, hunchback. Looking along the 
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FIG. 3: The input-output relation for Bicoid regulation of 
Hunchback expression, redrawn from Ref ^33] . Dashed curves 
show mean expression levels in different embryos, thick black 
line is the mean across all embryos, and points with error bars 
show the mean and standard deviation of Hb expression at a 
given Bed concentration in one embryo. 



anterior-posterior axis of the embryo one thus has an ar- 
ray of nuclei that experience a graded range of transcrip- 
tion factor concentrations. Using antibody staining and 
image processing methods, it thus is possible to collect 
thousands of points on a scatter plot of input (Bicoid 
concentration) vs. output (Hunchback protein concen- 
tration); since even in a single embryo there are many 
nuclei that have the same Bed concentration, one can 
examine both the mean Hunchback (Hb) response and 
its variance; data from Ref [33| are shown in Fig [3] 

The mean response of Hb to Bed is fit reasonably well 
by Eq (H]) with a Hill coefficient /i = 50, and in Fig 
2] we replot the noise in this response as a function of 
the mean. The peak of expression noise near half maxi- 
mal expression — the signature of input noise — is clearly 
visible. More quantitatively, we find that the data are 
well fit by Eq (|14p with the contribution from output 
noise (a ~ 1/380) much smaller than that from input 
noise (f3 w 1/2). We also consider the same model with 
h — > oo, and this fully switch-like model, although for- 
mally still within error bars, systematically deviates from 
the data. Finally we consider a model in which diffu- 
sion noise is absent, but we include the switching noise 
from Eq (jlip . which generalizes to the case of coopera- 
tive binding (see Appendix B). Interestingly, this model 
has the same number of parameters as the diffusion noise 
model, but does a significantly poorer job of fitting the 
data. While the fit can be improved further by adding 
a small background to the noise, we emphasize that Eq 
(|14p correctly captures the non-trivial shape of the noise 
curve with only two parameters. Because input noise falls 



to zero at maximal expression, the sole remaining noise 
at that point is the output noise, and this uniquely deter- 
mines the parameter a. The strength of the input noise 
(/?) then is determined by the height of the noise peak, 
and there is no further room for adjustment. The shape 
of the peak is predicted by the theory with no additional 
parameters, and the different curves in Fig 2] demonstrate 
that the data can distinguish among various functional 
forms for the peak. 

Are the parameters a and (3 that fit the Bcd/Hb data 
biologically reasonable? The fact that diffusive noise 
dominates at intermediate levels of expression {(3 3> a) is 
the statement that the Hunchback expression level pro- 
vides a readout of Bed concentration with a reliability 
that is close to the physical limit set by diffusional shot 
noise, as was argued in Ref [33j based on the magnitude 
of the noise level and estimates of the relevant micro- 
scopic parameters that determine (3. The dominance of 
diffusive noise over switching noise presumably is related 
to the h igh cooperativity of the Bcd/Hb input /output 
relation [291 ]. 

The parameter a measures the strength of the out- 
put noise and thus depends on the absolute number 
of Hb molecules and on the number proteins produced 
per mRNA transcript. If this burst size in the range 
Rpfe ~ 1 — 10, then our fit predicts the maximum ex- 
pression level of Hb corresponds to po = 700 — 4000 
molecules in the nucleus. Given the volume of the nu- 
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FIG. 4: Standard deviation of Hunchback expression as a 
function of the mean (points with error bars), replotted from 
Ref [33J. The black line is a ht of combined output and dif- 
fusion noise contributions, from Eq (|14|l with h — 5, and the 
dashed red line is with h — > oo, from Eq l|16[l. In contrast, 
the dashed blue line is the best fit of combined output and 
switching noise contributions. Although both diffusion and 
switching noise produce a peak at intermediate expression 
levels, the shapes of the peaks are distinguishable, and the 
data favor the diffusion noise model. 
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clei at this stage of development (~ 140 /zm 3 ; see Refs 
[33I [35l| ). this is a concentration of 8 — 48 nM. Although 
we don't have independent measurements of the abso- 
lute Hunchback concentration, this is reasonable for tran- 
scription factors, which typically act in the nanoMolar 
range [H, [3?], [H, H^, [40, 5H, an d can be compared 
with the maximal nuclear concentration of Bed, which 
is 55 ± 3 nM [33| . Larger burst sizes would predict larger 
maximal expression levels, or conversely measurements of 
absolute expression levels might give suggestions about 
the burst size for translation in the early Drosophila em- 
bryo. 



V. DISCUSSION 

In the process of transcriptional regulation, the (out- 
put) expression level of regulated genes acts as a sen- 
sor for the (input) concentration of transcription factors. 
The performance of this sensor, and hence the regulatory 
power of the system, is limited by noise. While changes 
in the parameters of the transcriptional and translational 
apparatus can change the level of output noise, the input 
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FIG. 5: Logarithmic plot of fractional variance vs the mean 
expression level for Hunchback, replotted from Ref 33] . Each 
black point represents the noise level measured across nuclei 
that experience the same Bed concentration within one em- 
bryo, and results are collected from nine embryos. The solid 
line shows a fit to r/p oc (p) ' in the region below half maximal 
mean expression; we find a good fit, with 7 = 1.04, despite 
the fact that these data show a clear signature of input noise 
when plotted in Fig 2] Dashed line indicates the global noise 
floor suggested in Ref and red points show the raw data 
with this variance added. Although the input noise still ap- 
pears as a drop in fractional noise level near maximal mean 
expression, this now is quite subtle and easily obscured by 
experimental errors. 



noise is determined by the physical properties of the tran- 
scription factor and its interactions with the target sites 
along the genome. Ultimately, there is a lower bound on 
this input noise level set by the shot noise in random ar- 
rival of the transcription factors at their targets, in much 
the same way that any imaging process ultimately is lim- 
ited by the random arrival of photons. 

Input and output noise seem to be so different that it is 
hard to imagine that they could be confused experimen- 
tally. Some of the difficulty, however, can be illustrated 
by plotting the results from the Bcd/Hb experiments of 
Ref [HI in the form which has become conventional in 
the study of gene expression noise, as a fractional vari- 
ance vs mean expression level (Fig [SJ . The signature of 
input noise, so clear in Fig31 now is confined to a narrow 
range (~ x2) near maximal expression. In contrast, over 
more than a decade of expression levels the noise level is 
a good fit to ?7p oc (p}~ 7 , with 7 = 1.04 being very similar 
to the prediction of the global noise model (7 = 1) in Eq 
(fTj). The departures from power-law behavior are easily 
obscured by global noise sources, experimental error, or 
by technical limitations that lead to the exclusion of data 
at the very highest expression levels, as in Ref @. 

The lesson from this analysis of the Bicoid/Hunchback 
data is that the signatures of input noise are surprisingly 
subtle. In this system, however, the behavior near half 
maximal expression is exactly the most relevant question 
biologically, since this is where the 'decision' is made to 
draw a boundary, as a first step in spatial patterning. 
In other systems, the details of noise in this region of 
expression levels might be less relevant for the organ- 
ism, but it is only in this region that different sources 
of noise are qualitatively distinguishable, as is clear from 
Fig [5] Thus, unless we have independent experiments to 
measure some of the parameters of the system, we need 
experimental access to the full range of expression levels 
and hence, implicitly, to the full dynamic range of tran- 
scription factor concentrations, if we want to disentangle 
input and output noise. 

The early Drosophila embryo is an attractive model 
system precisely because the organism itself generates a 
broad range of transcription factor concentrations, and 
conveniently arranges these different samples along the 
major axes of the embryo. A caveat is that since we 
don't directly control the transcription factor concentra- 
tion, we have to measure it. In particular, in order to 
measure the variance of the output (Hunchback, in the 
present discussion) we have to find many nuclei that all 
have the same input transcription factor (Bicoid) concen- 
tration. Because the mean output is a steep function of 
the input, errors in the measurement of transcription fac- 
tor concentration can simulate the effects of input noise, 
as discussed in Ref (33[. Thus, a complete analysis of 
input and output noise requires not only access to a wide 
range of transcription factor concentrations, but rather 
precise measurements of these concentrations. 

Why are the different sources of noise so easily con- 
fused? If noise is dominated by randomness in a single 
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step of the translation process, then the number of pro- 
tein molecules will obey the Possion distribution, and the 
variance in copy number will be equal to the mean. But 
if we can't actually turn measurements of protein level 
into molecule counts, then all we can say is that the vari- 
ance will be proportional to the mean. If the dominant 
noise source is a single step in transcription, then the 
number of mRNA transcripts will obey the Poisson dis- 
tribution, and the variance of protein copy numbers still 
will be proportional to the mean, but the proportionality 
constant will be enhanced by the burst size. The same 
reasoning, however, can be pushed further back: if, far 
from maximal expression, the dominant source of noise 
is the infrequent binding of a transcriptional activator 
(or dissociation of a repressor) to its target site, then 
the variance in protein copy number still will be propor- 
tional to the mean. Thus, the proportionality of variance 
to mean implies that there is some single rare event that 
dominates the noise, and by itself doesn't distinguish the 
nature of this event. 

If noise is dominated by regulatory events, then the 
number of mRNA transcripts should be drawn from a 
distribution broader than Poisson. In effect the idea 
of bursting, which amplifies protein relative to mRNA 
number variance, applies here too, amplifying the vari- 
ance of transcript number above the expectations from 
the Poisson distribution. Transcriptional bursting has 
in fact been observed directly Q , although it is not clear 
whether this arises from fluctuations in transcription fac- 
tor binding or from other sources. 

Previous arguments have made it plausible that input 
noise is significant in comparison to the observed variance 
of gene expression [26J, and we have shown here that 
models which assign all of the noise to common factors 
on the output side are inconsistent with the embedding of 
gene expression in a regulatory network. The signatures 
of input noise seem clear, but can be surprisingly subtle 
to distinguish in real data. We have argued that the 
Bicoid/Hunchback system provides an example in which 
input noise is dominant, and further that the detailed 
form of the variance vs mean supports a dominant role 
for diffusion rather than switching noise. Although there 
are caveats, this is consistent with the idea that, as with 
other critical biological processes [H, 52, EH, the 
regulation of gene expression can operate with a precision 
limited by fundamental physical principles. 
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APPENDIX A 

We consider a simplified model of regulated gene ex- 
pression, as schematized in Fig ED 

d t c = L>V 2 c(x,f)-n(5(x-xo)+S-£> (Al) 

n = fc + c(xo, t)(l — n) — fc_n + (A2) 

e = R e n - r f T 1 e + £ e (A3) 

p = Rpe-r^p + Zp. (A4) 

Equation (|A1|) describes the diffusion of the transcription 
factor that can be absorbed to or released from a bind- 
ing site on the DNA located at xo. These transcription 
factors are produced at sources S and degraded at sinks 
2?, which can both be spatially distributed and can also 
contribute to the noise in c. Equation (|A2[) describes the 
dynamics of the binding site occupancy; binding occurs 
with a second order rate constant fc+ and unbinding with 
a first order rate constant fc_ , and the dissociation con- 
stant of the site is = k-/k + . The Langevin term 
induces stochastic (binomial) switching between oc- 
cupied and empty states of the site. Equations (|A3|) 
and (IA4|) describe the production and degradation of 
mRNA and protein, respectively, and include Langevin 
noise terms associated with these birth and death pro- 
cesses. 

This seems a good place to note that, while conven- 
tional, the assumption that transcription and translation 
are simple one step processes seems a bit strong. We 
hope to return to this point at another time. 

Our goal is to compute the variance in protein copy 
number, (Jp(c). For simplicity we will assume that the 
transcription factors are present at a fixed total number 
in the cell and that they do not decay, <S = T> = 0. We 
will see that even with this simplification, where the over- 
all concentration of transcription factors does not fluctu- 
ate, we still get an interesting noise contribution from the 
randomness associated with diffusion in Eq (| Al[) . 

Our basic strategy is to find the steady state solution 
of the model, and then linearize around this to com- 
pute the response of the variables {n, e,p} to the various 
Langevin forces {£„, £ e , £ p }. In the linear approximation, 
the steady states are also the mean values: 

c = c (A5) 

fc + c + /c_ c + K d 
(e) = R e T e {n) (A7) 
(p) = R p T p {e) =po(n), (A8) 

where po = R e T e R p T p is the maximum mean expression 
level. Notice that what we have called p = (p) /po in the 
text is just the mean occupancy, (n), of the transcription 
factor binding site. 

Small departures from steady state are written in a 
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Fourier representation: 
c(x,t) = 



n = (n) + 



+ /l/<0^-"^><«) 

du> 



2tt 



(A10) 



(e) + J ^e-^6e(uj) (All) 



P = (p) + 



du> 
to* 



(A12) 



Similarly, each of the Langevin terms is written in its 
Fourier representation, 



(A13) 



where fi = n, e,p. 

As a first step we use the Fourier representation to 
solve Eq (|A1[) for <5c(xo,i) that we need to substitute 



into Eq (|A2[) for the binding site occupancy: 



8c(x ,t) 
<5c(x , w) 



did 



2tt 

iui8h(w) 

iu>8h(u>) 
nDa 



d 3 k 



(2tt) 3 -~iuj + D\k\ 2 



(A14) 
(A15) 
(A16) 



The integral over k in Eq (|A15[) is divergent at large 
|k| (ultraviolet). This arises, as explained in Ref [2q . 
because we started with the assumption that the binding 
reaction occurs at a point — the delta function in Eq (|A1[) . 
In fact our description needs to be coarse grained on a 
scale corresponding to the size of the binding site, so we 
introduce a cutoff so that |k| < fc max = 2n/a, where a is 
the linear size of the binding site. 

Linearizing Eq (|A2[) for the dynamics of the site occu- 
pancy, we have 



— iui8n(u>) = —{k + c + k-)6h(u>) + fc+(l — (n))5c(xo, w) + £n (<*>). 
Substituting our result for 5c(xq,lj) from Eq (|A16[) . we find 



iuj8fi(uj') 

iio8n{u}) = — (fc+c + k-)8n{uj) + fc+(l — (n)) h£ra(w) 



irDa 



Ml - (n)) 
TtDa 



8h(uS) = —(k+c + k-)8n(u>) + £ n (u>) 
onyuj) = 



-iw(l + E) + (fc+c + fc_) 
where £ = fc+(l — (n))/(%Da). The linearization of Eqs (|A3|) and (j A4[l takes the form 

1 



icj(5e(a->) 
-iuj8p(uj) 



-8e(u>) + R e 8h(uj) + £ e (uj) 
8p(uj) + R p Se(uj) + € P (u)) 



(A17) 

(A18) 
(A19) 

(A20) 

(A21) 
(A22) 



Each Langevin term is independent, and each frequency component uj is correlated only with the component at —u), 
defining the noise power spectrum (^(lu)^(—uj')) = 2tt8(lu — uj')N'^(lu) for /i = n,e,p. Solving the three linear 
equations, Eqs (|A20fTA"22"|) . we can find the power spectrum of the protein copy number fluctuations, 



AC, 



R 



2 D 2 



c^ 2 + 1/t 2 p ( W 2 + 1/t 2 )(^ 2 + 1/t 2 ) 



RiR, 



(c 2 + l/r 2 )(c 2 + l/r 2 )[(l + S) 2 w 2 + 1/r 2 ] ' 



(A23) 



where l/r c = fc+c + fc_. This form has a very intuitive 
interpretation: each Langevin term represents a noise 
source; as this noise propagates from the point where it 
enters the dynamical system to the output, it is subjected 
both to gain of each successive stage (prefactors R) , and 
to filtering by factors of T r = (ui 2 + 1/r 2 ) -1 . 



The total variance in protein copy number is given by 
an integral over the spectrum, 

((Spf)=4 = J ^S p {u), (A24) 

and the noise power spectra of the Langevin terms asso- 
ciated with the mRNA and protein dynamics have the 
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simple forms Af e (uj) — 2R e (n) and Af p (uj) — 2R p (e), re- 
spectively. The spectrum J\f n (u>) is more subtle. Oneway 
to derive it is to realize that since there is only one bind- 
ing site and this site is either occupied or empty, the total 
variance of Sn must be given by the binomial formula, 



<(<5n) 2 > = (n)(l -(„». 



(A25) 



Starting with Eq (|A20j) and the analog of Eq (|A24j) . we 
can use this condition to set the magnitude of Af n . Alter- 
natively, we can use the fact that binding and unbinding 
come to equilibrium, and hence the fluctuations in n are 
a form of thermal noise, like Brownian motion or Johnson 
noise, and hence the spectrum J\f n is determined by the 
fluctuation-dissipation theorem [26] . The result is that 



M l = -(l + S)(n)(l-(n». 



(A26) 



For simplicity we consider the case where the protein 
lifetime t p is long compared with all other time scales in 
the problem. Then we can approximate Eq (|A23|) as 



We notice that the first term in this equation is R p t p (e) , 
which is just the mean number of proteins (p) from Eq 
(lA8l). The second term 



Tp(R p T e ) 2 R e (n) = 



R p T p (R e T e (n))(R p T e ) (A29) 
R P T p {e)(R p T e ) (A30) 

RpT e (p). (A31) 



Thus, the first two terms together contribute (1 + 
R P T e )(p) to the variance, and this corresponds to the 
output noise term in Eq (|14|) . 

The third term in Eq (|A28[) contains the contribution 
of input noise to the variance in protein copy number. 
To simplify this term we note that the steady state of Eq 
(|A2|) is equivalent to 



fc+c(l - (n» = k-(n). 



(A32) 



2 , : , 2 [K + (RpTefK + {R p T e R e T c ) 2 M n ] . 
uJ + L/T p 

(A27) 

Substituting the forms of the individual noise spectra A/^ 
and doing the integral over uj [Eq (|A24[) ]. we find the 
variance in protein copy number 

al = T p [R p (e) + (R p T e ) 2 R e (n)} 

+ ^(R p T e R eTc ) 2 (l + E)(n)(l - (n)). (A28) 



Thus we can write 



— = k + c + k- 

t c 



(n) 



1- In) 



l-(n> 



(A33) 
(A34) 



The term we arc interested in is 



-^(R p T e R e T c ) 2 (l + Z)(n)(l-{n)) = (R p T p R e T e ) 2 ^{1 + E)(n)(l - (n)) (A35) 

= P^(l + E)(n)(l-(n)) 2 (A36) 

^ s <4^ + ^M^M (n)(1 _ (n))2 (A37) 



where in the last step we once again use Eq ([A32]) to 
rewrite the ratio fc+/fc_ in terms of (n). We recognize 
the two terms in this result as the switching and diffusion 
terms in Eq (fT4")l . 



APPENDIX B 

To generalize this analysis of noise to cooperative inter- 
actions among transcription factors it is useful to think 



more intuitively about the two terms in Eq (|A38[) . corre- 
sponding to switching and diffusion noise. Consider first 
the switching noise. 

We are looking at a binary variable n such that the 
number of proteins is pon. The total variance in n must 
be ((Sn) 2 ) = (n)(l - (n)) [Eq (|A25])]. This noise fluctu- 
ates on a time scale r c , so during the lifetime of the pro- 
tein we see N s — t p /t c independent samples. The current 
protein concentration is effectively an average over these 
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samples, so the effective variance is reduced to 

((Snf) cS = -L(n}(l - <»» = J(n)(l - (n». (Bl) 

Except for the factor of po that converts n into p, this is 
the first term in Eq (|A"3"g]l . 

Now if /i transcription factors bind cooperatively, we 
can still have two states, one in which transcription is 
possible and one in which it is blocked. For the case of 
activation, which we are considering here, the active state 
corresponds to all binding sites being filled, and so the 
rate at which the system leaves this state, fc_, shouldn't 
depend on the concentration of the transcription factors. 
The rate at which the system enters the active state does 
depend on concentration, but this doesn't matter, be- 
cause with only two states we must always have an ana- 
log of Eq (|A32[) . which allows us to eliminate the "on 
rate" in favor of fc_ and (n). The conclusion is that the 
first term in Eq (|A38|) . corresponding to switching noise, 
is unchanged by cooperativity as long as the system is 
still well approximated as having just two states of tran- 
scriptional activity that depend on the potentially many 
more states of binding site occupancy. 

For the diffusion noise term we use the ideas of Refs 
[H, [H, HsJ . Diffusion noise should be thought of as an 
effective noise in the measurement of the concentration 
c, with a variance 



1 



c 2 nDacT p ' 



(B2) 



where again we identify the protein lifetime as the time 
over which the system averages. For the system with a 
single binding site, 



(n) 



c + K d ' 



so that 



9(n) 1, u , 
-q— = -<n)(l - (n)). 
oc c 



(B3) 



(B4) 



The noise in concentration, together with this sensitivity 
of n to changes in the concentration, should contribute a 
noise variance 



((^) 2 ) eff 



d(n) 



dc 



(n) 2 (l-(n)) 2 
TrDacT v 



(B5) 



This is (up to the factor of p ) the second term in Eq 
(IA38[) . Now the generalization to cooperative interac- 
tions is straightforward. If we have 



(n) 



(B6) 



then 



9{n) h. 



(B7) 

Since the effective noise in concentration is unchanged 
[29l |. the only effect of cooperativity is to multiply the 
second term in Eq (j A38[) by a factor of h 2 . 

Thus, in the expression [Eq (|14p ] for the variance of 
protein copy number, cooperativity has no effect on the 
switching noise by actually increases the diffusion noise 
by a factor of h 2 . When written as a function of the mean 
copy number and the transcription factor concentration, 
this leaves the functional form of the variance fixed, only 
changing the coefficients. The overall effect it to make the 
contribution of diffusion noise more important. One way 
to say this is that, when we refer the noise in copy number 
back to the input, cooperativity causes the equivalent 
concentration noise to become closer to the limit Eq (|B2[) 
set by diffusive shot noise [29j . 

Reference 33[ also considers the possibility that noise 
is reduced by averaging among neighboring nuclei. This 
does not change the form of any of the noise terms, but 
does change the microscopic interpretation of the coef- 
ficients a and [3. For example, averaging for a time t p 
over N nuclei is equivalent to having one nucleus with an 
averaging time Nt p . 
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